{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# DashVector\n",
    "\n",
    "> [DashVector](https://help.aliyun.com/document_detail/2510225.html) is a fully-managed vectorDB service that supports high-dimension dense and sparse vectors, real-time insertion and filtered search. It is built to scale automatically and can adapt to different application requirements.\n",
    "\n",
    "This notebook shows how to use functionality related to the `DashVector` vector database.\n",
    "\n",
    "To use DashVector, you must have an API key.\n",
    "Here are the [installation instructions](https://help.aliyun.com/document_detail/2510223.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Install"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "!pip install dashvector dashscope"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "We want to use `DashScopeEmbeddings` so we also have to get the Dashscope API Key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "pycharm": {
     "name": "#%%\n",
     "is_executing": true
    },
    "ExecuteTime": {
     "end_time": "2023-08-11T10:37:15.091585Z",
     "start_time": "2023-08-11T10:36:51.859753Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import getpass\n",
    "\n",
    "os.environ[\"DASHVECTOR_API_KEY\"] = getpass.getpass(\"DashVector API Key:\")\n",
    "os.environ[\"DASHSCOPE_API_KEY\"] = getpass.getpass(\"DashScope API Key:\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "pycharm": {
     "name": "#%%\n",
     "is_executing": true
    },
    "ExecuteTime": {
     "end_time": "2023-08-11T10:42:30.243460Z",
     "start_time": "2023-08-11T10:42:27.783785Z"
    }
   },
   "outputs": [],
   "source": [
    "from langchain.embeddings.dashscope import DashScopeEmbeddings\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain.vectorstores import DashVector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "pycharm": {
     "is_executing": true,
     "name": "#%%\n"
    },
    "ExecuteTime": {
     "end_time": "2023-08-11T10:42:30.391580Z",
     "start_time": "2023-08-11T10:42:30.249021Z"
    }
   },
   "outputs": [],
   "source": [
    "from langchain.document_loaders import TextLoader\n",
    "\n",
    "loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
    "documents = loader.load()\n",
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "docs = text_splitter.split_documents(documents)\n",
    "\n",
    "embeddings = DashScopeEmbeddings()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "We can create DashVector from documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "dashvector = DashVector.from_documents(docs, embeddings)\n",
    "\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "docs = dashvector.similarity_search(query)\n",
    "print(docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "We can add texts with meta datas and ids, and search with meta filter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "ExecuteTime": {
     "end_time": "2023-08-11T10:42:51.641309Z",
     "start_time": "2023-08-11T10:42:51.132109Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Document(page_content='baz', metadata={'key': 2})]\n"
     ]
    }
   ],
   "source": [
    "texts = [\"foo\", \"bar\", \"baz\"]\n",
    "metadatas = [{\"key\": i} for i in range(len(texts))]\n",
    "ids = [\"0\", \"1\", \"2\"]\n",
    "\n",
    "dashvector.add_texts(texts, metadatas=metadatas, ids=ids)\n",
    "\n",
    "docs = dashvector.similarity_search(\"foo\", filter=\"key = 2\")\n",
    "print(docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
